skip to main content


Search for: All records

Creators/Authors contains: "Croft, W. Bruce"

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

  1. null (Ed.)
    Asking clarifying questions in response to ambiguous or faceted queries has been recognized as a useful technique for various information retrieval systems, in particular, conversational search systems with limited bandwidth interfaces. Analyzing and generating clarifying question have been recently studied in the literature. However, accurate utilization of user responses to clarifying questions has been relatively less explored. In this paper, we propose a neural network model based on a novel attention mechanism, called multi source attention network. Our model learns a representation for a user-system conversation that includes clarifying questions. In more detail, with the help of multiple information sources, our model weights each term in the conversation. In our experiments, we use two separate external sources, including the top retrieved documents and a set of different possible clarifying questions for the query. We implement the proposed representation learning model for two downstream tasks in conversational search; document retrieval and next clarifying question selection. We evaluate our models using a public dataset for search clarification. Our experiments demonstrate significant improvements compared to competitive baselines. 
    more » « less
  2. Existing learning to rank models for information retrieval are trained based on explicit or implicit query-document relevance information. In this paper, we study the task of learning a retrieval model based on user-item interactions. Our model has potential applications to the systems with rich user-item interaction data, such as browsing and recommendation, in which having an accurate search engine is desired. This includes media streaming services and e-commerce websites among others. Inspired by the neural approaches to collaborative filtering and the language modeling approaches to information retrieval, our model is jointly optimized to predict user-item interactions and reconstruct the item textual descriptions. In more details, our model learns user and item representations such that they can accurately predict future user-item interactions, while generating an effective unigram language model for each item. Our experiments on four diverse datasets in the context of movie and product search and recommendation demonstrate that our model substantially outperforms competitive retrieval baselines, in addition to providing comparable performance to state-of-the-art hybrid recommendation models. 
    more » « less
  3. null (Ed.)
    Conversational search is one of the ultimate goals of information retrieval. Recent research approaches conversational search by simplified settings of response ranking and conversational question answering, where an answer is either selected from a given candidate set or extracted from a given passage. These simplifications neglect the fundamental role of retrieval in conversational search. To address this limitation, we introduce an open-retrieval conversational question answering (ORConvQA) setting, where we learn to retrieve evidence from a large collection before extracting answers, as a further step towards building functional conversational search systems. We create a dataset, OR-QuAC, to facilitate research on ORConvQA. We build an end-to-end system for ORConvQA, featuring a retriever, a reranker, and a reader that are all based on Transformers. Our extensive experiments on OR-QuAC demonstrate that a learnable retriever is crucial for ORConvQA. We further show that our system can make a substantial improvement when we enable history modeling in all system components. Moreover, we show that the reranker component contributes to the model performance by providing a regularization effect. Finally, further in-depth analyses are performed to provide new insights into ORConvQA. 
    more » « less
  4. null (Ed.)
  5. Estimating the quality of a result list, often referred to as query performance prediction (QPP), is a challenging and important task in information retrieval. It can be used as feedback to users, search engines, and system administrators. Although predicting the performance of retrieval models has been extensively studied for the ad-hoc retrieval task, the effectiveness of performance prediction methods for question answering (QA) systems is relatively unstudied. The short length of answers, the dominance of neural models in QA, and the re-ranking nature of most QA systems make performance prediction for QA a unique, important, and technically interesting task. In this paper, we introduce and motivate the task of performance prediction for non-factoid question answering and propose a neural performance predictor for this task. Our experiments on two recent datasets demonstrate that the proposed model outperforms competitive baselines in all settings. 
    more » « less
  6. Intelligent assistants change the way for people to interact with computers and make it possible for people to search for products through conversations when they have purchase needs. During the interactions, the system could ask questions on certain aspects of the ideal products to clarify the users' needs. Previous work proposed to ask users the exact characteristics of their ideal items before showing results. However, users may not have clear ideas about what an ideal item should be like, especially when they have not seen any items. So it is more feasible to facilitate the conversational search by showing example items and asking for feedback instead. In addition, when the users provide negative feedback for the presented items, it is easier to collect their detailed feedback on certain properties (aspect-value pairs) of the non-relevant items. By breaking down the item-level negative feedback to fine-grained feedback on aspect-value pairs, more information is available to help clarify users' intents. So in this paper, we propose a conversational paradigm for product search driven by non-relevant items, based on which fine-grained feedback is collected and utilized to show better results in the next iteration. We then propose an aspect-value likelihood model to incorporate both positive and negative feedback on fine-grained aspect-value pairs of the non-relevant items. Experimental results show that our model is significantly better than state-of-art product search baselines without using feedback and baselines using item-level negative feedback. 
    more » « less
  7. Users often fail to formulate their complex information needs in a single query. As a consequence, they need to scan multiple result pages and/or reformulate their queries, which is a frustrating experience. Alternatively, systems can improve user satisfaction by proactively asking questions from the users to clarify their information needs. Asking clarifying questions is especially important in information-seeking conversational systems, since they can only return a limited number (often only one) of results. In this paper, we formulate the task of asking clarifying questions in open-domain information retrieval. We propose an offline evaluation methodology for the task. In this research, we create a dataset, called Qulac, through crowdsourcing. Our dataset is based on the TREC Web Track 2009-2012 data and consists of over 10K question-answer pairs for 198 TREC topics with 762 facets. Our experiments on an oracle model demonstrate that asking only one good question leads to over 100% retrieval performance improvement, which clearly demonstrates the potential impact of the task. We further propose a neural model for selecting clarifying question based on the original query and the previous question-answer interactions. Our model significantly outperforms competitive baselines. To foster research in this area, we have made Qulac publicly available. 
    more » « less
  8. Conversational AI is a rapidly developing research field in both industry and academia. As one of the major branches of conversational AI, question answering and conversational search has attracted significant attention of researchers in the information retrieval community. It has been a long overdue feature for search engines or conversational assistants to retrieve information iteratively and interactively in a conversational manner. Previous work argues that conversational question answering (ConvQA) is a simplified but concrete setting of conversational search. In this setting, one of the major challenges is to leverage the conversation history to understand and answer the current question. In this work, we propose a novel solution for ConvQA that involves three aspects. First, we propose a positional history answer embedding method to encode conversation history with position information using BERT (Bidirectional Encoder Representations from Transformers) in a natural way. BERT is a powerful technique for text representation. Second, we design a history attention mechanism (HAM) to conduct a "soft selection" for conversation histories. This method attends to history turns with different weights based on how helpful they are on answering the current question. Third, in addition to handling conversation history, we take advantage of multi-task learning (MTL) to do answer prediction along with another essential conversation task (dialog act prediction) using a uniform model architecture. MTL is able to learn more expressive and generic representations to improve the performance of ConvQA. We demonstrate the effectiveness of our model with extensive experimental evaluations on QuAC, a large-scale ConvQA dataset. We show that position information plays an important role in conversation history modeling. We also visualize the history attention and provide new insights into conversation history understanding. The complete implementation of our model will be open-sourced. 
    more » « less
  9. Conversational search is an emerging topic in the information retrieval community. One of the major challenges to multi-turn conversational search is to model the conversation history to understand the current question. Existing methods either prepend history turns to the current question or use complicated attention mechanisms to model the history. We propose a conceptually simple yet highly effective approach referred to as history answer embedding. It enables seamless integration of conversation history into a conversational question answering (ConvQA) model built on BERT (Bidirectional Encoder Representations from Transformers). We first explain our view that ConvQA is a simplified but concrete setting of conversational search, and then we provide a general framework to solve ConvQA. We further demonstrate the effectiveness of our approach under this framework. Finally, we analyze the impact of different numbers of history turns under different settings. We show that history prepending methods degrade dramatically when given a long conversation history while our method is robust and shows advantages under such a situation, which provides new insights into conversation history modeling in ConvQA. 
    more » « less
  10. Intelligent personal assistant systems, with either text-based or voice-based conversational interfaces, are becoming increasingly popular. Most previous research has used either retrieval-based or generation-based methods. Retrieval-based methods have the advantage of returning fluent and informative responses with great diversity. The retrieved responses are easier to control and explain. However, the response retrieval performance is limited by the size of the response repository. On the other hand, although generation-based methods can return highly coherent responses given conversation context, they are likely to return universal or general responses with insufficient ground knowledge information. In this paper, we build a hybrid neural conversation model with the capability of both response retrieval and generation, in order to combine the merits of these two types of methods. Experimental results on Twitter and Foursquare data show that the proposed model can outperform both retrieval-based methods and generation-based methods (including a recently proposed knowledge-grounded neural conversation model) under both automatic evaluation metrics and human evaluation. Our models and research findings provide new insights on how to integrate text retrieval and text generation models for building conversation systems. 
    more » « less